PLUGIN-1413 Added argument for BQ temporary staging bucket names#1152
PLUGIN-1413 Added argument for BQ temporary staging bucket names#1152
Conversation
|
|
||
| private static final Map<Schema.Type, Set<LegacySQLTypeName>> TYPE_MAP = ImmutableMap.<Schema.Type, | ||
| Set<LegacySQLTypeName>>builder() | ||
| Set<LegacySQLTypeName>>builder() |
There was a problem hiding this comment.
this change isn't needed
| if (Integer.parseInt(chunkSize) % MediaHttpUploader.MINIMUM_CHUNK_SIZE != 0) { | ||
| collector.addFailure( | ||
| String.format("Value must be a multiple of %s.", MediaHttpUploader.MINIMUM_CHUNK_SIZE), null) | ||
| String.format("Value must be a multiple of %s.", MediaHttpUploader.MINIMUM_CHUNK_SIZE), null) |
There was a problem hiding this comment.
reverted this change
| private static final Logger LOG = LoggerFactory.getLogger(BigQueryUtil.class); | ||
|
|
||
| private static final String DEFAULT_PARTITION_COLUMN_NAME = "_PARTITIONTIME"; | ||
| private static final String BIGQUERY_BUCKET_PREFIX_PROPERTY_NAME = "io.cdap.plugin.bigquery.bucket.prefix"; |
There was a problem hiding this comment.
we do a similar thing for cmek, where the argument key is 'gcp.cmek.key.name' (see CmekUtils). Let's follow a similar pattern and name it something like 'gcp.bigquery.bucket.prefix'
| * We use this to ensure location name length is constant (only 8 characters). | ||
| * | ||
| * @param location location to checksum | ||
| * @return checksum value as@ an 8 character string (hex). |
| hashValues.add(hash); | ||
| } | ||
|
|
||
| System.out.println(hashValues); |
| if (bucketName == null && bucketPrefix != null) { | ||
| // Check if the destination dataset exists. | ||
| DatasetId datasetId = DatasetId.of(config.getDatasetProject(), config.getDataset()); | ||
| Dataset dataset = bigQuery.getDataset(datasetId); |
There was a problem hiding this comment.
It seems like we should already be making this call as part of the existing validation/automatic bucket creation. If so, we should do some refactoring to avoid duplicate calls. It's become kind of a mess now, but ideally we do all the I/O in a single place and pass the return objects around.
There was a problem hiding this comment.
I don't see an easy way to refactor this without making 2 calls to get the dataset.
The problem is the bucket name is needed even when in preview. This makes it difficult to refactor the code in a way that doesn't change the entire method signature for the abstract bigquery sink and other callers (such as the BigQuery Pushdown implementation).
There was a problem hiding this comment.
I have refactored the code a bit, hope this helps
No description provided.